Efficient Dynamic Local Enumeration for HPF

نویسندگان

  • Will Denissen
  • Henk J. Sips
چکیده

In translating HPF programs, a compiler has to generate local iteration and communication sets. Apart from local enumeration, local storage compression is an issue, because in HPF array alignment functions can introduce local storage inefficiencies. Storage compression, however, may not lead to serious performance penalties. A problem in semi-automatic translation is that a compiler should generate efficient code in all cases the user may expect efficient translation (no surprises). However, in current compilers this turns out to be not always true. A major cause for this inefficiencies is that compilers use the same fixed enumeration scheme in all cases. In this paper, we present an efficient dynamic local enumeration method, which always selects the optimal solution at run-time and has no need for code duplication. The method is compared with the PGI and the Adaptor compiler. Dynamic selection of enumeration orders Once a data mapping function is given in an HPF program, we know exactly which element of an array is owned by which processor. However, the storage of all elements owned by a single processor still needs to be determined. When a compiler is to generate code for local iteration or communication sets, it needs to enumerate the local elements efficiently. Efficient in the sense that only a small overhead is allowed compared to the work inside the iteration space. Because in both phases local elements need to be referenced, storage and enumeration are closely linked to each other. An overview of the basic schemes for local storage and local enumeration is given in [1]. In a cyclic(m) distribution the template data elements are distributed in blocks of size m in a round robin fashion. The relation between an array index i and the row, column, and processor tuple (r, c, p) is given by the position equation [1]. To avoid inefficient memory storage, local storage is compressed by removing unused template elements. There are various compression techniques, each with their own compression factor for rows (∆r) and columns (∆c). For a cyclic(m) distribution, the original (normalized) volume assignment is transformed into a two-deep loopnest. The outer loop enumerates the global indices of the starting points of the rows (order=row wise) or the columns (order=column wise), depending on the enumeration order as specified by order. In most compilers the order of enumeration is fixed. However, we have mofified the method outlined in [1] such that the generated code for both enumeration orders is identical, by adding a parameter ‘order’ to the run-time routines that

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

Data parallel languages, like High Performance Fortran (HPF), support the notion of distributed arrays. However, the implementation of such distributed array structures and their access on message passing computers is not straightforward. This holds especially for distributed arrays that are aligned to each other and given a block-cyclic distribution. In this paper, an implementation framework ...

متن کامل

An Expression-Rewriting Framework to Generic Communication Sets for HPF Programs with Block-Cyclic Distribution

In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of \Block-Cyclic" distributions with two-level ...

متن کامل

An Expression-Rewriting Framework to Generate Communication Sets for HPF Programs with Block-Cyclic Distribution

In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of “Block-Cyclic” distributions with two-level ...

متن کامل

Finding performance bugs with the TNO HPF benchmark suite

HPF has been designed to provide portable performance on distributed memory machines. An important aspect of portable performance is the behavior of the available HPF compilers. Ideally, a programmer may expect comparable performance between different HPF compilers, given the same program and the same machine. To test the performance portability between compilers, we have designed a special ben...

متن کامل

Communication set generations with CSD calculus and expression-rewriting framework

In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of \Block-Cyclic" distributions with two-level ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000